Introduction to Neural Networks & Architecture

2,195 Views

A total of 3539 characters, expected to take 9 minutes to complete reading.

What is a neural network?

First of all, let's look at Baidu's definition of neural network:

Artificial neural network (Artificial Neural Network, or ANN) is a hot research topic in the field of artificial intelligence since the 1980 s. It abstracts the neural network of the human brain from the perspective of information processing, establishes a simple model, and forms different networks according to different connections. In engineering and academia, it is often referred to directly as neural networks or neural networks. A neural network is a computational model that consists of a large number of nodes (or neurons) connected to each other. Each node represents a specific output function, called the excitation function (activation function). The connection between each two nodes represents a weighted value for the signal through the connection, called a weight, which is equivalent to the memory of the artificial neural network. The output of the network is different depending on the connection mode of the network, the weight value and the excitation function. The network itself is usually an approximation of some algorithm or function in nature, or it may be an expression of a logical strategy.

As the name implies, neural networks are trying to simulate the work of brain neurons to complete tasks such as information processing and pattern recognition, such as image recognition, speech recognition and so on. These tasks are obvious to the brain, but in the field of computer science and machine learning, they are very complex and challenging. Because no program can be written directly to simulate the complex functions of neurons, the researchers designed and trained neural networks to approximate these functions.

How do neural networks identify?

To illustrate this problem, we introduce a cat-dog classification (I. e. two categories) This simple question, see how it works.

Here are two black and white pictures of cats and dogs, their pixels 256 by 256. Imagine how your brain can distinguish between these two pictures.
Introduction to Neural Networks & Architecture

First, the brain uses visual perception to identify different features of the image, such as a cat's ears and whiskers, and a dog's nose and mouth. According to these characteristics, the brain will give the similarity of cat or dog (a value: 0% ~ 100%), and finally give our judgment: cat or dog.

Similarly, the neural network should have a similar structure, be able to extract and recognize image features, and finally give the similarity of various classifications (in this case, there are only two categories, namely cats or dogs).

We then give a simple neural network structure for dog and cat recognition (here a fully connected neural network is usedFCNN, and notCNN).
If you 've ever been familiar with neural networks, you should be familiar with the diagram below.

Introduction to Neural Networks & Architecture

But if you are seeing it for the first time, it doesn't matter. I will take cat and dog identification as an example to explain the specific meaning of each item in this picture.

First, we expand the picture and splice each row of the picture together to form a column of 256*256=65536 pixels. We call each element in the column a Neuron and stated that it was listed Input Layer(I. e., the data for the input image).

Introduction to Neural Networks & Architecture

Obviously, it has 256*256=65536 neurons. You can temporarily think of it as the photoreceptor cells of our eyes, transmitting the most primitive picture information to our brain. Now assume that each neuron has a photosensitive data in the range of 0.0 to 1.0 (the data of the neuron is called Activation value), representing the brightness value of the pixel, the closer to the 1.0, the brighter the pixel, otherwise, the darker.

To identify whether the image is a cat or a dog, the brain identifies various features such as the animal's eyes, nose, ears, etc. So we also need Hidden layer(There can be more than one layer, here for convenience from the first layer), used to imitate the brain to recognize various features of the picture.

Introduction to Neural Networks & Architecture

As shown in the figure above, also for convenience, we only set three neurons for the hidden layer, assuming that they represent the similarity of the three features: dog's ears, cat's ears and cat's eyes (in real neural networks, it is often a black box model, that is, we do not know which neuron corresponds to which features).

Based on the similarity values of the hidden layer neurons, we can calculate the probability of cats and dogs. The more characteristics of a cat, the more likely it is to be a cat, and vice versa. Here is a binary classification problem, so we usually use softmax function The output probability is controlled within the range of 0.0~1.0 to facilitate judgment. In the output shown in the figure above, there is a 0.9 probability of a cat and a 0.1 probability of a dog.

So what do these lines do?

Introduction to Neural Networks & Architecture

These lines represent the flow of data from the previous layer, that is, the data from the next layer is calculated based on which data from the upper layer.

Use mathematical expressions to describe the structure of neural networks

After reading the previous chapter, you must have understood the general principle and structure of neural networks, so let's now discuss how the values of each layer of neurons are obtained. In addition, in this chapter we will discuss how to use mathematical expressions to describe the structure.
Note: This chapter needs to use the knowledge of linear algebra. If you don't know what a matrix is, you 'd better go to Baidu to find out.

Calculation of neuron activation values from input layer to hidden layer

We assume that the neuron activation values of the input layer are $x_1,x_2,x_3,\cdots,x_{65535}$ , you can use the matrix:

 $X= \begin{pmatrix} x_1 & x_2 & x_3 & \cdots & x_{65535} \\ \end{pmatrix}$

Among them $X$ Represents the value of the input.
Similarly, all the activation values of the hidden layer obtained by the input layer transformation can be expressed as a matrix:

 $H= \begin{pmatrix} h_1 & h_2 & h_3 \\ \end{pmatrix}$

Introduction to Neural Networks & Architecture

Each line in the graph represents a multiplication by a Weight (Weight), such $w_{11}$ . Then the activation value of the first neuron in the hidden layer can be calculated by the following formula:

 $h_0=w_{1,1}*x_0+w_{2,1}*x_1+w_{3,1}*x_2+\cdots+w_{65536,1}*x_{65535}$

In order to ensure that the value is between 0.0 and 1.0 (for convenience of calculation), we also introduce Deviation (Bias), such as. Also add the activation function, which is used here.Sigmoid, $\sigma(x)$ Indicates:

 $\sigma(x)=\frac{1}{1+e^{-x}}$

At this point:

 $h_0=\sigma(w_{1,1}*x_0+w_{2,1}*x_1+w_{3,1}*x_2+\cdots+w_{65536,1}*x_{65535}+b_1)$

So for all $h_i(i=1,2,3)$ How do you express it? Smart you must have discovered that it can be represented by a matrix. Weight (W) andDeviation (B):

 $W= \begin{pmatrix} w_{1,1} & w_{1,2} & w_{1,3} \\ w_{2,1} & w_{2,2} & w_{2,3} \\ \vdots & \vdots & \vdots \\ w_{65536,1} & w_{65536,2} & w_{65536,3} \\ \end{pmatrix}$

 $B= \begin{pmatrix} b_1 & b_2 & b_3 \\ \end{pmatrix}$

Well, obviously, from matrix multiplication we can get:

 $H=\sigma(X \cdot W + B)$

In this way, we perfectly solve the problem of calculating the activation value from the input layer to the hidden layer.

Formula relationships between each layer

According to the formula relationship between the input layer and the hidden layer, we can follow the relationship between the hidden layer and the output layer.

Introduction to Neural Networks & Architecture

Set the two values of the output $y_0$ , $y_1$ The probability that the animals representing the picture are cats and dogs, respectively (obviously the probability range is between 0.0 and 1.0), which we represent by a matrix:

 $Y= \begin{pmatrix} y_0 & y_1 \\ \end{pmatrix}$

Here to guarantee $y_0$ , $y_1$ Between 0.0 and 1.0, we usesoftmax function, introducing weights and biases, then:

 $Y=softmax(W' \cdot H + B')$

where the weights and biases are:

 $W'= \begin{pmatrix} w_{1,1} & w_{1,2} \\ w_{2,1} & w_{2,2} \\ w_{3,1} & w_{3,2} \\ \end{pmatrix}$

 $B'= \begin{pmatrix} b_1 & b_2 & b_3 \\ \end{pmatrix}$

Then we have the complete function expression from the input layer to the output layer:

 $Y=softmax(W' \cdot \sigma(X \cdot W + B) + B')$

Determination and significance of weights and biases

According to the above formula, given the input picture, we can get the probability of outputting a cat or dog. To ensure the correctness of the output, the parameters must be determined. $W,B,W',B'$ .
We need to know that determining the weights and biases in a neural network is a crucial task because these parameters directly affect the performance and accuracy of the model. Typically, these parameters are determined by a method called Training (training) process to determine. In this process, we use a large number of known tags (such as pictures of cats or dogs) to adjust the parameters of the network so that the output of the network is as close to the actual tags as possible.
The process of these adjustments usually involves the following steps (which will be explained in detail in later chapters because they are only introduced here and will not be repeated here):

Forward propagation (Forward Propagation)
Forward propagation refers to the process of computing from the input layer, through the hidden layer, and up to the output layer. In each layer, the activation value of the neuron is calculated and passed to the next layer until the final output is obtained.
Loss function (Loss Function)
The loss function is used to measure the difference between the predicted and actual values of the model. In cat and dog classification problems, the commonly used loss function is cross-entropy loss (Cross-Entropy Loss).
Backpropagation (Backpropagation)
Backpropagation is a key step in adjusting weights and biases. By calculating the gradient of the loss function relative to each parameter, we can know how to adjust these parameters to reduce the loss. Specifically, backpropagation uses the Chain Rule to compute gradients and then update parameters based on those gradients. The update rule is typically using a gradient descent algorithm (Gradient Descent Algorithm).
Gradient Descent Algorithm (Gradient Descent Algorithm)
The gradient descent algorithm optimizes the loss function step by step through iteration. Specifically, it calculates the gradient of each parameter and then updates the parameters in the opposite direction of the gradient to reduce the loss.